Indexing Reduced Dimensionality Spaces Using Single DimensionalIndexesHeng
نویسندگان
چکیده
The dimensionality curse has greatly aaected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance information when the data set is not globally correlated. To reduce loss of information and degradation of search quality, cluster based dimensionality reduction should be used instead. In this paper, we present an adaptive local dimensionality reduction (LDR) technique which rst identiies eeective clusters based on Mahalanobis distance, and for each cluster, performs local dimensionality reduction. The data points in each cluster of the reduced-dimensionality space are then transformed into single distance values with reference to the centroid of the cluster, and indexed using a single dimensional index for nearest neighbor search. Unlike an existing LDR technique which uses an index for each cluster, we use one single B +-tree for the whole data set. Extensive performance studies using both real and synthetic data show that the method achieves higher precision compared to existing global dimensionality reduction and local dimensionality reduction methods, and is more eecient in terms of query performance.
منابع مشابه
Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem
In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces (Ωd) from which we pick datasets Xd in an i.i.d. fashion. We call the subscript d the dimension of the space Ωd (e.g. for R d the dimension is just the usual one) and we allow the si...
متن کاملMKL-Tree: A Hierarchical Data Structure for Indexing Multidimensional Data
Recently, multidimensional point indexing has generated a great deal of interest in applications where objects are usually represented through feature vectors belonging to high-dimensional spaces and are searched by similarity according to a given example. Unfortunately, although traditional data structures and access methods work well for low-dimensional spaces, they perform poorly as dimensio...
متن کاملMeasuring the Difficulty of Distance-Based Indexing
Data structures for similarity search are commonly evaluated on data in vector spaces, but distance-based data structures are also applicable to non-vector spaces with no natural concept of dimensionality. The intrinsic dimensionality statistic of Chávez and Navarro provides a way to compare the performance of similarity indexing and search algorithms across different spaces, and predict the pe...
متن کاملFast Approximate Nearest-Neighbor Queries in Metric Feature Spaces by Buoy Indexing
An indexing scheme for solving the problem of nearest neighbor queries in generic metric feature spaces for content-based retrieval is proposed aiming to break the “dimensionality curse.” The basis for the proposed method is the partitioning of the feature dataset into a fixed number of clusters that are represented by single buoys. Upon submission of a query request, only a small number of clu...
متن کاملHigh Dimensional Feature Indexing Using Hybrid Trees
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Traditional multidimensional data structures (e.g., R-tree, KDB-tree, gr...
متن کامل